-
Notifications
You must be signed in to change notification settings - Fork 15
Bug fix/output img #41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…号を付与します。__init__.py で、Excel COM から各シートを個別PDFとして書き出し、PDFページを順にレンダリングして保存する実装に変更しています。
📝 WalkthroughWalkthroughAdds subprocess-capable PDF-to-PNG rendering for Excel sheet images, changes export_sheet_images/export_pdf flows to per-sheet PDF export and subprocess or in-process rasterization, updates docs and release notes, and extends tests for both rendering paths and naming/aggregation behavior. Changes
Sequence Diagram(s)sequenceDiagram
participant Caller as Caller (process_excel / export_sheet_images)
participant Excel as Excel COM / xlwings
participant Main as Main Process (renderer)
participant Worker as Subprocess Worker
participant PDF as pypdfium2 / PDF document
participant Disk as Disk (PNG output)
Caller->>Excel: Export per-sheet PDF(s)
Excel-->>Main: PDF path(s)
Main->>Main: Check EXSTRUCT_RENDER_SUBPROCESS
alt Subprocess enabled
Main->>Worker: Send job (PDF path, pages, DPI, outdir)
Worker->>PDF: Load PDF
Worker->>PDF: Render pages -> bitmaps
Worker->>Disk: Write PNG files
Worker-->>Main: Return PNG paths or error
Main->>Caller: Aggregate results/errors
else In-process
Main->>PDF: Load PDF
Main->>PDF: Render pages -> bitmaps
Main->>Disk: Write PNG files
Main->>Caller: Return PNG paths
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @tests/com/test_render_smoke.py:
- Around line 39-58: The test test_render_multiple_print_ranges_images asserts
exactly 4 image files but is brittle because multi-page sheets produce extra
files with _p02/_p03 suffixes; update the assertion in
test_render_multiple_print_ranges_images to allow multi-page output (e.g.,
change assert len(images) == 4 to assert len(images) >= 4) or replace the check
by counting unique sheet image prefixes (strip trailing _pNN suffixes from each
filename and assert the number of unique prefixes is 4) so the test passes
regardless of per-sheet page splits.
🧹 Nitpick comments (3)
src/exstruct/render/__init__.py (3)
86-91: Consider clarifying the pdfium validation intent.In subprocess mode (lines 89-90),
_require_pdfium()is called solely to validate the dependency exists before spawning subprocesses, but the result is discarded. While functionally correct, adding a brief comment would clarify this intent.💡 Suggested clarification
if not use_subprocess: pdfium = cast(Any, _require_pdfium()) else: - _require_pdfium() + _require_pdfium() # Validate dependency before spawning subprocesses
174-198: Consider adding a timeout to prevent indefinite hangs.The subprocess join (line 189) has no timeout, which means the parent process could block indefinitely if the subprocess hangs or deadlocks (e.g., due to a corrupted PDF or resource exhaustion). This could degrade user experience or cause the application to become unresponsive.
💡 Suggested improvement
process.start() - process.join() + process.join(timeout=300) # 5-minute timeout + if process.is_alive(): + process.terminate() + process.join(timeout=5) + raise RenderError("PDF rendering subprocess timed out") if not queue.empty(): result = queue.get()
201-229: Code duplication with_render_pdf_pages_in_process.The rendering logic (scale calculation, page iteration, bitmap rendering, PNG saving with page suffix) is duplicated between this worker and
_render_pdf_pages_in_process. While the duplication is partly justified due to different contexts (subprocess vs in-process), consider extracting the core rendering loop into a shared helper to reduce maintenance burden.Also note: Line 214 creates
output_dirdefensively, but the parent process already ensures this directory exists at line 85. This redundancy is harmless but could be removed.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (3)
tests/assets/multiple_print_ranges_4sheets.xlsxis excluded by!**/*.xlsxtests/assets/sample.xlsis excluded by!**/*.xlsuv.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
docs/agents/FEATURE_SPEC.mddocs/agents/TASKS.mdsrc/exstruct/render/__init__.pytests/com/test_render_smoke.pytests/render/test_render_init.py
💤 Files with no reviewable changes (1)
- docs/agents/FEATURE_SPEC.md
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Add type hints to all function and method arguments and return values following mypy strict mode
UseAnytype only at external library boundaries (xlwings, pandas, numpy) and avoid type inference from external library internals
Return Pydantic BaseModel instances instead of dictionaries or tuples for structured data
Follow import ordering: standard library, third-party, exstruct internal modules
Add Google-style docstrings to all functions and classes
Keep cyclomatic complexity below 12 per function and split complex functions into smaller ones
Avoid writing God Functions (large monolithic functions) and God Objects (large classes with many responsibilities)
Avoid unnecessary nesting and deeply nested conditional branches
Ensure code passes mypy strict mode with zero errors
Ensure code passes Ruff checks (E, W, F, I, B, UP, N, C90 rules) with zero errors
Avoid circular dependencies between modules
Normalize external library data (xlwings, pandas) to Pydantic models at the boundary layer
Use the ExStruct system as specified in core development; do not write code dependent on external library internal structures
Files:
tests/render/test_render_init.pysrc/exstruct/render/__init__.pytests/com/test_render_smoke.py
🧠 Learnings (1)
📚 Learning: 2026-01-06T13:53:35.863Z
Learnt from: CR
Repo: harumiWeb/exstruct PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-06T13:53:35.863Z
Learning: Reference architectural documentation in docs/architecture/pipeline.md, docs/contributors/architecture.md, docs/agents/CODING_GUIDELINES.md, docs/agents/DATA_MODEL.md, and docs/agents/TASKS.md as needed during development
Applied to files:
docs/agents/TASKS.md
🧬 Code graph analysis (2)
src/exstruct/render/__init__.py (2)
src/exstruct/io/__init__.py (1)
_sanitize_sheet_filename(96-99)src/exstruct/errors.py (1)
RenderError(28-29)
tests/com/test_render_smoke.py (1)
src/exstruct/__init__.py (1)
process_excel(318-398)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (11)
docs/agents/TASKS.md (1)
5-8: LGTM!The task documentation clearly captures the implementation strategy for subprocess-based rendering, memory isolation, and error handling patterns that align with the code changes in this PR.
src/exstruct/render/__init__.py (5)
4-5: LGTM!New imports for multiprocessing and os are correctly placed with the standard library imports.
44-49: LGTM!The addition of
display_alerts = FalseandSaveAsto convert the workbook to xlsx before PDF export correctly addresses the xls file compatibility issue and prevents blocking dialogs during automation.
144-146: LGTM!The environment variable toggle is implemented correctly with sensible defaults (subprocess enabled by default) and case-insensitive comparison for disabling.
149-171: LGTM!The in-process rendering function is well-structured with proper type hints, context management for the PDF document, and correct page numbering logic.
140-141: These functions have different implementations and behavior—consolidation would break the render module.The
_sanitize_sheet_filenameinsrc/exstruct/render/__init__.pydiffers from the io module's version. The render version calls.strip()on the result, ensuring that whitespace-only input returns"sheet"(as verified by the test at line 380 in test_render_init.py). The io version omits.strip(), which means it would return the whitespace string unchanged. Importing from io would break the expected behavior in the render module.No action required.
tests/render/test_render_init.py (5)
16-126: LGTM!The fake classes are well-extended to support the new rendering functionality:
FakeSheetApiproperly simulates per-sheet PDF exportFakeBookApi.SaveAssupports the xls-to-xlsx conversion flowFakePdfDocument.__len__enables multi-page testing scenarios
158-230: LGTM!The monkeypatch targets are correctly updated to patch
xw.Appdirectly (the imported module reference) rather than through the render module's namespace.
254-274: LGTM!The test comprehensively validates the in-process rendering path including sheet name sanitization, whitespace fallback, and multi-page suffix generation.
335-366: LGTM!The test effectively validates that the subprocess rendering path is correctly delegated when
EXSTRUCT_RENDER_SUBPROCESS=1. The mock approach properly isolates the delegation logic from actual subprocess spawning.
369-374: LGTM!The test validates the core toggle behavior. Consider adding edge case tests for
"false"and"FALSE"values to verify case-insensitivity, though this is optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d6c66ec7ea
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
tests/render/test_render_init.py (1)
395-396: Consider removing unusedempty()method.The
empty()method is defined but never used in the test suite. Per the commit messages, the implementation moved away fromQueue.empty()toqueue.get(timeout=...), making this method dead code in the test double.♻️ Suggested removal
- def empty(self) -> bool: - return self.payload is Nonesrc/exstruct/render/__init__.py (3)
188-189: Add timeout toprocess.join()to prevent indefinite hangs.If the worker process deadlocks or hangs (e.g., due to a pypdfium2 bug or large PDF),
process.join()will block indefinitely. Consider adding a timeout and terminating the process if it exceeds the limit.♻️ Suggested fix
process.start() - process.join() + process.join(timeout=300) # 5 minute timeout for large PDFs + if process.is_alive(): + process.terminate() + process.join(timeout=5) + raise RenderError("PDF rendering subprocess timed out") result = _get_subprocess_result(queue)
208-236: Consider extracting shared rendering logic to reduce duplication.The core rendering logic (lines 220-232) is nearly identical to
_render_pdf_pages_in_process(lines 158-170). While extraction is complicated by the subprocess boundary and import semantics, you could extract a helper that accepts thepdfiummodule as a parameter.That said, the current duplication is acceptable given the subprocess isolation requirements and the relatively small code surface.
191-195: Type narrowing could improve safety.The type
dict[str, list[str] | str]doesn't distinguish between the "paths" and "error" keys. Ifpathswere accidentally a string, the iteration would yield characters. Consider adding runtime validation or usingTypedDictfor stricter typing.♻️ Example with TypedDict
from typing import TypedDict class WorkerResult(TypedDict, total=False): paths: list[str] error: strOr add a runtime check:
paths = result.get("paths", []) + if not isinstance(paths, list): + raise RenderError(f"Invalid worker result: paths is not a list") return [Path(path) for path in paths]
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
docs/release-notes/v0.3.6.mdmkdocs.ymlpyproject.tomlsrc/exstruct/render/__init__.pytests/com/test_render_smoke.pytests/render/test_render_init.py
✅ Files skipped from review due to trivial changes (2)
- docs/release-notes/v0.3.6.md
- pyproject.toml
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/com/test_render_smoke.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Add type hints to all function and method arguments and return values following mypy strict mode
UseAnytype only at external library boundaries (xlwings, pandas, numpy) and avoid type inference from external library internals
Return Pydantic BaseModel instances instead of dictionaries or tuples for structured data
Follow import ordering: standard library, third-party, exstruct internal modules
Add Google-style docstrings to all functions and classes
Keep cyclomatic complexity below 12 per function and split complex functions into smaller ones
Avoid writing God Functions (large monolithic functions) and God Objects (large classes with many responsibilities)
Avoid unnecessary nesting and deeply nested conditional branches
Ensure code passes mypy strict mode with zero errors
Ensure code passes Ruff checks (E, W, F, I, B, UP, N, C90 rules) with zero errors
Avoid circular dependencies between modules
Normalize external library data (xlwings, pandas) to Pydantic models at the boundary layer
Use the ExStruct system as specified in core development; do not write code dependent on external library internal structures
Files:
src/exstruct/render/__init__.pytests/render/test_render_init.py
🧬 Code graph analysis (2)
src/exstruct/render/__init__.py (3)
tests/render/test_render_init.py (10)
open(65-70)SaveAs(41-42)ExportAsFixedFormat(29-31)ExportAsFixedFormat(37-39)close(53-55)quit(81-83)render(97-99)to_pil(89-91)save(155-165)put(386-387)src/exstruct/io/__init__.py (1)
_sanitize_sheet_filename(96-99)src/exstruct/errors.py (1)
RenderError(28-29)
tests/render/test_render_init.py (2)
src/exstruct/render/__init__.py (4)
_use_render_subprocess(144-146)_render_pdf_pages_subprocess(174-195)_get_subprocess_result(198-205)_render_pdf_pages_worker(208-236)src/exstruct/errors.py (1)
RenderError(28-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: test (windows-latest, 3.12)
🔇 Additional comments (23)
mkdocs.yml (1)
29-31: LGTM!The new v0.3.6 release notes entry follows the existing naming convention and is correctly placed at the top of the Release Notes list in descending version order.
tests/render/test_render_init.py (16)
1-16: LGTM!Imports are properly organized: standard library first, then third-party (
pytest,xlwings), then internal modules. The use ofcastandModuleTypefor test type handling is appropriate.
18-31: LGTM!The
FakeSheetandFakeSheetApistubs correctly simulate the xlwings Sheet interface needed for per-sheet PDF export. The implementation writes valid PDF header bytes which aligns with the test expectations.
41-42: LGTM!The
SaveAsstub correctly simulates the Excel Book.api interface for the temp workbook save inexport_pdf.
76-83: LGTM!The
display_alertsattribute correctly simulates the xlwings App interface used byexport_pdf.
102-128: LGTM!The
FakePdfDocumentstub now correctly simulates multi-page PDFs via__len__, enabling tests to verify the_pNNpage suffix naming convention. The conditional page count based on path is a pragmatic approach for test variability.
131-149: LGTM!
ExplodingPdfDocumentcorrectly simulates PDF loading failures for testing error propagation paths.
277-297: LGTM!Good test coverage for the in-process rendering path with:
- Name sanitization (
Sheet/1→Sheet_1, whitespace →sheet)- Multi-page PDF naming with
_p02suffix- Environment variable toggle for subprocess mode
300-316: LGTM!The test correctly verifies that
RenderErrorfrom_require_excel_apppropagates without being wrapped again.
338-369: LGTM!Good integration test verifying that
export_sheet_imagescorrectly delegates to the subprocess rendering path when enabled, with proper argument forwarding for each sheet.
399-418: LGTM!
FakeProcesscorrectly simulates the multiprocessing Process interface for testing subprocess coordination.
421-434: LGTM!
FakeContextcorrectly simulates the multiprocessing context interface for testing subprocess rendering coordination.
437-461: LGTM!The test correctly verifies the subprocess success path with proper queue result handling and path conversion.
464-481: LGTM!The test correctly verifies that worker errors reported via the queue are propagated as
RenderError.
484-490: LGTM!The test correctly verifies timeout handling in
_get_subprocess_result.
493-517: LGTM with a note on test isolation.The test correctly verifies the worker function's output. The
sys.modulesmanipulation forpypdfium2is cleaned up in thefinallyblock, which is good. If tests run in parallel, consider usingmonkeypatch.syspath_prependor a context manager approach for stronger isolation.
520-537: LGTM!The test correctly verifies that worker exceptions are captured and reported via the queue payload.
src/exstruct/render/__init__.py (6)
1-16: LGTM!Imports are properly organized and the addition of
multiprocessingandosmodules is appropriate for the new subprocess rendering functionality.
44-48: LGTM!Setting
display_alerts = Falseand saving to a temp file before export is defensive programming that prevents Excel dialogs from blocking automation and isolates the original workbook from modifications.
79-137: LGTM!The refactored
export_sheet_imagescorrectly:
- Conditionally loads pypdfium2 only when needed for in-process rendering
- Iterates sheets to export per-sheet PDFs before rasterization
- Delegates to subprocess or in-process rendering based on environment toggle
- Maintains proper resource cleanup with try/finally
144-146: LGTM!The function correctly defaults to subprocess rendering (safer for memory isolation) and only disables it when explicitly set to "0" or "false".
149-171: LGTM!The in-process rendering logic is correct with proper DPI-to-scale conversion and multi-page naming. However, see the comment on
_render_pdf_pages_workerregarding code duplication.
198-205: LGTM!The 5-second timeout is reasonable for queue result retrieval after the process has joined. The broad exception handling is appropriate here since all failures should be reported uniformly.
#38 の対策を実施
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.